Search results for "Universal similarity metric"

showing 3 items of 3 documents

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

2007

Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rath…

Computer scienceAlgorismesPrediction by partial matchingCompression dissimilaritycomputer.software_genreBiochemistryProtein Structure SecondaryPhylogenetic studiesStructural BiologySequence Analysis ProteinDatabases Proteinlcsh:QH301-705.5Biological dataNCDApplied MathematicsGenomicsClassificationCDComputer Science ApplicationsBenchmarking:Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC]Universal compression dissimilarityArea Under CurveMetric (mathematics)lcsh:R858-859.7Data miningAlgorithmsData compressionResearch Article:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Normalization (statistics)lcsh:Computer applications to medicine. Medical informaticsBioinformatics Sequence Alignment AlgorithmsSet (abstract data type)Similarity (network science)Normalized compression sissimilarityData compression (Computer science)AnimalsHumansAmino Acid SequenceMolecular BiologyBiologyDades -- Compressió (Informàtica)USMUniversal similarity metricProteinsUCDProtein Structure TertiaryData setGenòmicaStatistical classificationlcsh:Biology (General)ROC CurvecomputerSequence AlignmentSoftwareBMC bioinformatics
researchProduct

Comparison of genomic sequences clustering using Normalized Compression Distance and Evolutionary Distance

2008

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a long procedure and the obtained dissimilarity results is not a metric. Recently the normalized compression distance was introduced as a method to calculate the distance between two generic digital objects, and it seems a suitable way to compare genomic strings. In this paper the clustering and the mapping, obtained using a SOM, with the traditional evolutionary distance and the compression distance are compared in order to understand if the two distances sets are similar. The first results indicate that the two distances catch differen…

Kolmogorov complexityuniversal similarity metricComputer sciencebusiness.industryDNA sequencePattern recognitionGenomic Sequence ClusteringCompression (functional analysis)Normalized compression distanceArtificial intelligenceCluster analysisbusinessDistance matrices in phylogenyclustering
researchProduct

Normalised compression distance and evolutionary distance of genomic sequences: comparison of clustering results

2009

Genomic sequences are usually compared using evolutionary distance, a procedure that implies the alignment of the sequences. Alignment of long sequences is a time consuming procedure and the obtained dissimilarity results is not a metric. Recently, the normalised compression distance was introduced as a method to calculate the distance between two generic digital objects and it seems a suitable way to compare genomic strings. In this paper, the clustering and the non-linear mapping obtained using the evolutionary distance and the compression distance are compared, in order to understand if the two distances sets are similar.

Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazionibusiness.industryCompression (functional analysis)Metric (mathematics)Normalized compression distanceuniversal similarity metric USM clustering DNA sequences normalised compression distance evolutionary distance genomic sequences nonlinear mapping bioinformaticsPattern recognitionArtificial intelligenceCluster analysisbusinessDistance matrices in phylogenyMathematics
researchProduct